Skip to content

ci: trim PR matrix to PY3.12 + CUDA 12.9.1 only (debug only)#2120

Draft
Andy-Jost wants to merge 3 commits into
NVIDIA:mainfrom
Andy-Jost:ajost/test-ci-py312-cuda129-hang
Draft

ci: trim PR matrix to PY3.12 + CUDA 12.9.1 only (debug only)#2120
Andy-Jost wants to merge 3 commits into
NVIDIA:mainfrom
Andy-Jost:ajost/test-ci-py312-cuda129-hang

Conversation

@Andy-Jost
Copy link
Copy Markdown
Contributor

Summary

Temporary debugging PR. Cuts the pull-request test matrix in
ci/test-matrix.yml down to a single Python 3.12 + CUDA 12.9.1 row per
platform/arch (linux-64 L4, linux-aarch64 A100, win-64 L4) so the
TestIpcReexport::test_main[DeviceMR] hang seen on the r595-driver
runners (e.g. https://github.com/NVIDIA/cuda-python/actions/runs/26162055334)
reproduces fast without burning compute on configurations we already
know are green. The nightly matrices are intentionally untouched.

This is not intended to merge; it exists only to drive CI iterations
while we investigate the hang. The patch is fully self-contained and
reverts cleanly.

Test plan

  • Linux-64 L4 PY3.12 / CUDA 12.9.1 job reproduces the
    TestIpcReexport::test_main[DeviceMR] hang (or surfaces a real
    failure once a fix lands).
  • Linux-aarch64 A100 PY3.12 / CUDA 12.9.1 job behaves the same.
  • Windows L4 PY3.12 / CUDA 12.9.1 job stays green (sanity that
    trimming the matrix didn't break the workflow plumbing).
  • No nightly job kicked off for this PR.

Temporary debugging change: cut the linux and windows pull-request
matrices down to a single Python 3.12 + CUDA 12.9.1 row each so the
TestIpcReexport hang on r595-driver runners reproduces quickly without
also burning compute on unrelated configurations. The nightly matrices
are intentionally untouched. Revert before merging.
@Andy-Jost Andy-Jost added CI/CD CI/CD infrastructure experiment Describes an investigation or measurement labels May 21, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 21, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Copy Markdown
Contributor Author

/ok to test

@Andy-Jost
Copy link
Copy Markdown
Contributor Author

/ok to test

@github-actions
Copy link
Copy Markdown

Wire the existing SETUP_SANITIZER gate behind a literal `false &&` so
every PR test job exports SETUP_SANITIZER=0 and SANITIZER_CMD=, which
makes run-tests invoke pytest directly instead of under
compute-sanitizer. Pairs with the matrix trim in this branch to test the
hypothesis that the TestIpcReexport hang on r595-driver runners is
specific to running pytest under compute-sanitizer. Revert before
merging.
@Andy-Jost
Copy link
Copy Markdown
Contributor Author

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure experiment Describes an investigation or measurement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant